Verifying Collective MPI Calls
نویسندگان
چکیده
The collective communication operations of MPI, and in general MPI operations with non-local semantics, require the processes participating in the calls to provide consistent parameters, eg. a unique root process, matching type signatures and amounts for data to be exchanged, or same operator. Exhaustive consistency checks are typically too expensive to perform under normal use of MPI and would compromise optimizations for high performance in the collective routines, but confusing and hard-to-find errors (deadlocks, wrong results, or program crash) can happen by inconsistent calls to collective operations. We suggest to use the MPI profiling interface to provide for more extensive semantic checking of calls to MPI routines with collective (non-local) semantics. With this, exhaustive semantic checks can be enabled during application development, and disabled for production runs. We discuss what can reasonably be checked by such an interface, and mention some inherent limitations of MPI to making a fully portable interface for semantic checking. The proposed collective semantics verification interface for the full MPI-2 standard has been implemented for the NEC proprietary MPI/SX and MPI/EX implementations.
منابع مشابه
Tuning MPI Collectives by Verifying Performance Guidelines
ABSTRACT MPI collective operations provide a standardized interface for performing data movements within a group of processes. The e ciency of collective communication operations depends on the actual algorithm, its implementation, and the speci c communication problem (type of communication, message size, number of processes). Many MPI libraries provide numerous algorithms for speci c collecti...
متن کاملPGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines
The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is becoming increasingly difficult to optimize MPI libraries, as many factors can influence the communication performance. To assist MPI developers and users, we pro...
متن کاملCollective Error Detection for MPI Collective Operations
An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather performance data on MPI programs. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detected locally (by a single process), other errors ...
متن کاملA Portable Method for Finding User Errors in the Usage of MPI Collective Operations
An MPI profiling library is a standard mechanism for intercepting MPI calls by applications. Profiling libraries are so named because they are commonly used to gather runtime information about performance characteristics. Here we present a profiling library whose purpose is to detect user errors in the use of MPI’s collective operations. While some errors can be detected locally (by a single pr...
متن کاملPractical Model-Checking Method for Verifying Correctness of MPI Programs
Formal program verification often requires creating a model of the program and running it through a model-checking tool. However, this model-creation step is itself error prone, tedious, and difficult for someone not familiar with formal verification. In this paper, we describe a tool for verifying correctness of MPI programs that does not require the creation of a model and instead works direc...
متن کامل